Roll-Forward and Rollback Recovery: Performance-Reliability Trade-Off
نویسندگان
چکیده
Trade-O Dhiraj K. Pradhan Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 fpradhan,[email protected] Abstract Performance and reliability achieved by a modular redundant system depend on the recovery scheme used. Typically, gain in performance using comparable resources results in reduced reliability. Several highperformance computers are noted for small mean time to failure. Performance is measured here in terms of mean and variance of the task completion time, reliability being a task-based measure de ned as the probability that a task is completed correctly. Two roll-forward schemes are compared with two rollback schemes for achieving recovery in duplex systems. The roll-forward schemes discussed here are based on a roll-forward checkpointing concept proposed in [5-8]. Roll-forward recovery schemes achieve signi cantly better performance than rollback schemes by avoiding rollback in most common fault scenarios. It is shown that the roll-forward schemes improve performance with only a small loss in reliability as compared to rollback schemes.
منابع مشابه
On Low-Cost Error Containment and Recovery Methods for Guarded Software Upgrading
To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). In this paper, we focus on a low-cost approach to error containment and recovery for GSU. To ensure low development cost, we exploit inherent system resource redundancies as the fault tolerance means. In order to mitigate the effect of residual software faults at low performance cost...
متن کاملResponsive Roll-Forward Recovery in Embedded Real-Time Systems
Roll-forward checkpointing schemes [Long et al. 1990; Pradhan and Vaidya 1992] are developed in order to avoid rollback in the presence of independent faults and increase the possibility that a task completes within a tight deadline. Despite of the adoption of roll-forward recovery, these schemes are not necessarily appropriate for time-critical applications because interactions with the extern...
متن کاملRoll-forward error recovery in embedded real-time systems
Roll-forward checkpointing schemes [8][10] are developed in order to avoid rollback in the presence of independent faults and to increase the possibility that a task completes within a tight deadline. However, despite of the adoption of roll-forward recovery, these schemes are not necessarily appropriate for time-critical applications because interactions with the external environment and commu...
متن کاملThe Cost of Recovery in Message Logging Protocols
ÐPast research in message logging has focused on studying the relative overhead imposed by pessimistic, optimistic, and causal protocols during failure-free executions. In this paper, we give the first experimental evaluation of the performance of these protocols during recovery. Our results suggest that applications face a complex trade-off when choosing a message logging protocol for fault to...
متن کاملCompletely Asynchronous Optimistic Recovery with Minimal Rollbacks
Consider the problem of transparently recovering an asynchronous distributed computation when one or more processes fail. Basing rollback recovery on optimistic message loggingand replay is desirable for several reasons, including not requiring synchronization between processes during failure-free operation. However, previous optimistic rollback recovery protocols either have required synchroni...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994